3 research outputs found
Information-theoretic Feature Selection via Tensor Decomposition and Submodularity
Feature selection by maximizing high-order mutual information between the
selected feature vector and a target variable is the gold standard in terms of
selecting the best subset of relevant features that maximizes the performance
of prediction models. However, such an approach typically requires knowledge of
the multivariate probability distribution of all features and the target, and
involves a challenging combinatorial optimization problem. Recent work has
shown that any joint Probability Mass Function (PMF) can be represented as a
naive Bayes model, via Canonical Polyadic (tensor rank) Decomposition. In this
paper, we introduce a low-rank tensor model of the joint PMF of all variables
and indirect targeting as a way of mitigating complexity and maximizing the
classification performance for a given number of features. Through low-rank
modeling of the joint PMF, it is possible to circumvent the curse of
dimensionality by learning principal components of the joint distribution. By
indirectly aiming to predict the latent variable of the naive Bayes model
instead of the original target variable, it is possible to formulate the
feature selection problem as maximization of a monotone submodular function
subject to a cardinality constraint - which can be tackled using a greedy
algorithm that comes with performance guarantees. Numerical experiments with
several standard datasets suggest that the proposed approach compares favorably
to the state-of-art for this important problem
Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation
Learning generative probabilistic models is a core problem in machine
learning, which presents significant challenges due to the curse of
dimensionality. This paper proposes a joint dimensionality reduction and
non-parametric density estimation framework, using a novel estimator that can
explicitly capture the underlying distribution of appropriate reduced-dimension
representations of the input data. The idea is to jointly design a nonlinear
dimensionality reducing auto-encoder to model the training data in terms of a
parsimonious set of latent random variables, and learn a canonical low-rank
tensor model of the joint distribution of the latent variables in the Fourier
domain. The proposed latent density model is non-parametric and universal, as
opposed to the predefined prior that is assumed in variational auto-encoders.
Joint optimization of the auto-encoder and the latent density estimator is
pursued via a formulation which learns both by minimizing a combination of the
negative log-likelihood in the latent domain and the auto-encoder
reconstruction loss. We demonstrate that the proposed model achieves very
promising results on toy, tabular, and image datasets on regression tasks,
sampling, and anomaly detection
Τεχνικές κατασκευής και αποκωδικοποίησης πολικών κωδίκων
Summarization: Polar codes, recently invented by Arikan, are the first provably capacity achieving codes for any binary input symmetric discrete memoryless channel
with low encoding and decoding complexity. This thesis explores the practical
implementation of polar codes which are complexity efficient and perform
well for binary erasure channel (BEC) and binary symmetric channel (BSC).
The explicit code construction is based on a characteristic called channel polarization which involves generating N extremal (perfect or completely noisy)
channels from N independent uses of the same base channel. Information
bits are sent over the noiseless channels while pilot bits, called frozen bits,
are assigned to the noisy ones. Code design for BEC is based on the recursive
relations presented in the original paper whereas for BSC we propose a
heuristic and efficient algorithm and compare it to the method of recursive
estimation of Bhattacharyya parameters of bit-channels. The encoding is implemented using a recursive butterfly structure with O(N logN) complexity,
where N is the block length of the code. Two main low complexity decoders
are compared in terms of bit error rate: successive cancellation decoder proposed by Arikan having complexity O(N logN) with susceptibility to error
propagation and mediocre bit error rate performance at small or moderate
code lengths and list decoder, proposed by Tal and Vardy, with complexity
O(LN logN) where L is the list size